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University of Minnesota 

Clustered binary data with a large number of covariates have be- 
come increasingly common in many scientific disciplines. This paper 
develops an asymptotic theory for generalized estimating equations 
(GEE) analysis of clustered binary data when the number of covari- 
ates grows to infinity with the number of clusters. In this "large n, 
diverging p" framework, we provide appropriate regularity conditions 
and establish the existence, consistency and asymptotic normality of 
the GEE estimator. Furthermore, we prove that the sandwich vari- 
ance formula remains valid. Even when the working correlation ma- 
trix is misspecified, the use of the sandwich variance formula leads 
to an asymptotically valid confidence interval and Wald test for an 
estimable linear combination of the unknown parameters. The ac- 
curacy of the asymptotic approximation is examined via numerical 
simulations. We also discuss the "diverging p" asymptotic theory for 
general GEE. The results in this paper extend the recent elegant 
work of Xie and Yang [Ann. Statist. 31 (2003) 310-347] and Balan 
and Schiopu-Kratina [Ann. Statist. 32 (2005) 522-541] in the "fixed 
p" setting. 

1. Introduction. A fundamental problem in statistical analysis is to char- 
acterize the effects of a set of covariates Xi , Xp on a response variable Y 
based on a sample of size n. Recently, there has been considerable interest in 
investigating this problem in the so-called "large n, diverging p" asymptotic 
framework, where the dimension of the covariates increases to infinity with 
the sample size. This setup allows statisticians to adopt a more complex sta- 
tistical model as more abundant data become available, and thus to reduce 
the modeling bias. 
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The "large n, diverging framework can be traced back to the earher 
pioneering work on M-estimators with a diverging number of parameter; see 
Huber (1973), Portnoy (1984, 1985, 1988), Mammen (1989), Welsh (1989), 
Bai and Wu (1994), He and Shao (2000) and the references therein. With 
the advent of high-dimensional data in many scientific areas, statistical the- 
ory developed in this new framework has become crucial for guiding prac- 
tical data analysis with high-dimensional covariates, which relies heavily 
on asymptotic theory to justify its validity. By allowing the covariates' di- 
mension to increase with the sample size. Fan and Peng (2004) studied non- 
concave penalized likelihood; Lam and Fan (2008) investigated profile-kernel 
likelihood inference with generalized varying coefficient partially linear mod- 
els; Huang, Horowitz and Ma (2008) explored bridge estimators in linear 
regression; Hjort, McKeague and Van Keilegom (2009) and Chen, Peng and 
Qin (2009) studied the effects of data dimension on empirical likelihood; Zou 
and Zhang (2009) studied the adaptive elastic net, Zhu and Zhu (2009) in- 
vestigated parameter estimation in a semiparametric regression model with 
highly correlated predictors. In the aforementioned literature, the number of 
covariates p grows to infinity at a polynomial rate oirf) for some < a < 1. 
In particular, most of these papers provide necessary conditions under which 
classical asymptotic theories remain valid for a in the range 

A different line of research considers the case where p can be much larger 
than n and even grow at an exponential rate of n, in which case the sparsity 
assumption and other more stringent regularity conditions are generally re- 
quired to investigate the large-sample properties. Furthermore, it is worth 
noting that much work has also been devoted to classification and multi- 
ple hypotheses testing problems with high-dimensional covariates, but these 
problems are different in nature from what is discussed in this paper. We 
refer to the review papers of Donoho (2000), Fan and Li (2006) and Fan 
and Lv (2010) for more comprehensive references on high-dimensional data 
analysis. 

When the research focus is on modeling the relationship between Y and 
a high-dimensional vector of covariates, the existing literature in the "large 
n, diverging p" setting has been largely restricted to independent data. In 
many modern data sets, in addition to the large dimensionality of covariates, 
complexity also arises when the responses are correlated due to repeated 
measures or clustered design. One representative example is the Framing- 
ham Heart Study, where the researchers are interested in linking common 
risk factors to the occurrence of cardiovascular diseases. In this study, many 
variables, such as age, smoking status, cholesterol level and blood pressure, 
were recorded for the participants during their clinic visits over the years to 
describe their physical characteristics and lifestyles. Another example is the 
Chicago Longitudinal Study in social science, which investigated the educa- 
tional and social development of about 1500 low income, minority youths in 
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the Chicago area. The study cohected a large amount of information on many 
variables that measure children's early antisocial behavior, individual-level 
attributes of the child, family attributes and social characteristics of both 
the child and the family, among others. In some other examples of clustered 
data, the number of variables measured for each individual or experimental 
unit may not be many, but when one considers various interaction effects, 
the actual number of predictors in the statistical model can still be large 
and better fits the "large p" setup. 

The intrinsic complexity of clustered data raises challenging issues for 
statistical analysis, especially for correlated non-Gaussian data where it is 
difficult to specify the full likelihood. In this paper, we establish the asymp- 
totic properties of generalized estimating equations (GEE), a semiparametric 
procedure widely used in practice for clustered data analysis, while allowing 
the covariate dimension to grow to infinity with the sample size. 

The GEE procedure was introduced in a seminal paper of Liang and 
Zeger (1986) as a useful extension of generalized linear models [McCullagh 
and Nelder (1989)] to correlated data. Instead of specifying the full likeli- 
hood, it only requires the knowledge of the first two marginal moments and 
a working correlation matrix. Thus, it is particularly effective for model- 
ing clustered binary or count data. A key advantage of the GEE approach is 
that it yields a consistent estimator (in the classical "large n, fixed p" setup), 
even if the working correlation structure is misspecified. The GEE estima- 
tor is also asymptotically efficient if the correlation structure is indeed cor- 
rectly specified. The original paper of Liang and Zeger focused mostly on the 
methodology development. Li (1997) adopted a minimax approach to study 
the consistency of GEE. A more complete and systematic large-sample the- 
ory for GEE, including consistency and asymptotic normality, was elegantly 
established by Xie and Yang (2003). Balan and Schiopu-Kratina (2005) also 
rigorously studied a closely related pseudo-likelihood framework for GEE. 
However, these papers all assume that p is fixed and that the number of clus- 
ters n goes to infinity. Xie and Yang (2003) also considered the case where the 
cluster size (number of observations within each cluster) is itself large, which 
corresponds to a large number of time points in the longitudinal setting. 

This paper examines the effect of high-dimensional covariates on the GEE 
estimator in the "large n, diverging p" setup, where p = Pn is a function 
of the sample size n. We focus on clustered binary data because binary re- 
sponse (e.g., disease status) is ubiquitous in many scientific applications and 
because of the relative transparency of technical derivation. We also discuss 
the related theory for general GEE in Section 5.1 The main technical chal- 
lenges come from the high dimensionality of the covariates, the dependence 
among observations within each cluster and the nuisance parameters in the 
working correlation matrix. We provide a self-contained derivation and ex- 
tend earlier theory in the literature on M-estimation with a large number of 
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parameters, which is not tailored for clustered data and generally has not 
considered nuisance parameters. 

We aim to answer the following essential questions. To what extent can the 
asymptotic results derived in the classical asymptotic framework for GEE 
still be deemed trustworthy when the number of covariates is large? How 
large can pn be (relative to n)? The main findings in this paper reveal that 
under reasonable conditions, the GEE estimator /3„ is \/pn/ n-consistent 
when p^/n — )• and that an arbitrary linear combination a^(/3„ — /3„o) 
asymptotically normal when p^/n — )■ 0, where /3„o is the true parameter 
value. These findings resonate with those in the literature for independent 
data in the "large p" setting. Moreover, we also verify that the desirable ro- 
bustness property against working correlation matrix misspecification still 
holds and that both the sandwich variance formula and the large-sample 
Wald test still remain valid in this new context. Understanding these fun- 
damental questions is essential to justifying asymptotic statistical inference 
based on GEE for analyzing real-world clustered data containing many co- 
variates, such as the validity of the confidence intervals provided by the GEE 
package in R, SAS and other statistical software packages. 

The rest of the paper is organized as follows. In Section 2, we provide 
a brief review of the GEE procedure for analyzing clustered binary data. 
Section 3 establishes the consistency and asymptotic normality of the GEE 
estimator, the consistency of the sandwich variance formula and the validity 
of the large-sample Wald test in the "large n, diverging p" framework. Sec- 
tion 4 examines the asymptotic results via numerical simulations. Section 5 
discusses general GEE and related problems. 

2. Generalized estimating equations. For the jth observation of the ith 
cluster, we observe a binary response variable Yij and a p„-dimensional 
vector of covariates Xjj, i = 1, . . . ,n and j = 1, . . . , mj. Observations from 
different clusters are independent, but those from the same clusters are cor- 
related. Let Yj = {Yii, . . . ,Yimi)'^ denote the vector of responses for the ith 
cluster and let Xj = (Xji, . . . ,Xjm.)^ be the associated mj x p.„ matrix of 
covariates. 

The marginal regression approach of GEE assumes that E(lij|Xjj) = vTjj 
and Var(yij|Xjj) = 7rjj(l — iTij), where a dispersion parameter may be added 
in the marginal variance function if over dispersion is suspected to be present. 
Furthermore, it relates the covariates to the marginal mean by specifying 
that 



where logit(7rjj) = log( ) is the link function and /3„ is a p„-dimensional 
vector of parameters. The true unknown parameter value is denoted by /3„q. 



(2.1) 
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Let 7ri(/3„) = (vrii(/3„), . . . ,7rim,(/3„))'^, where 7rij(/3„) = exp(X^-/3„)/[l + 
exp(X?^/3„)]. Further, let Aj(/3„) be the mj x rrii diagonal matrix with the 
jth diagonal element Ajj(/3„) = 7rjj(/3„)(l — TTij{(3n)), j = 1, . . . , mj. In what 
follows, we assume rrii = m < oo, for simplicity. Liang and Zeger (1986) 
suggested to estimate /3„q by solving the following generalized estimating 
equation in /3„: 

n 

(2.2) 5^Xf Ai(/3jVri(Y, - 7r,(/3J) = 0, 

1=1 

where V, is a working covariance matrix. 
3. Asymptotic properties when pn — oo. 

3.1. GEE estimator with estimated working correlation matrix. In ap- 
plications, the true correlation matrix of Yj, denoted by Rq, is unknown. 
The working covariance matrix is often specified via a working correlation 

1 /2 1/2 

matrix R(t): = {pjR{r)A-' {/SJ, where T is a finite-dimensional 
parameter. Commonly used working correlation structures include AR-1, 
compound symmetry and unstructured working correlation, among others. 
Note that, in practice, the working correlation matrix is chosen to be inde- 
pendent of the covariates, for simplicity. However, for correlated non- normal 
data, the range of correlation generally depends on the univariate marginals. 
Thus, R(t) should be understood as a weight matrix [Chaganty and Joe 
(2004)]. Chaganty and Joe demonstrated that GEE with an appropriately 
chosen working correlation matrix does have good efficiency when compared 
with a proper likelihood model. 

Given a working correlation structure, r is often estimated using a residual- 
based moment method, which requires an initial consistent estimator of /3„o- 
We use R to denote the resulting estimated working correlation matrix, with 
the subscript "n" suppressed. Following (2.2), we formally define the GEE 
estimator /3„ as the solution of 

n 

(3.1) S„(/3J = J^Xf Ap(/3jR-iA-^/'(/3J(Y, - 7r,(/3J) = 0. 

i=l 

To solve for (3^, we can iterate between a modified Fisher scoring algorithm 
for /3„ and the moment estimation for r. In the following, we provide exam- 
ples of an initial consistent estimator and an estimated working correlation 
matrix. 

Example 1 (Initial estimator for /3„o when pn — ?• oo). A simple way 
to obtain an initial estimator for /3„o is to solve the generalized estimating 
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equations under the working independence assumption 

n 

(3.2) Sn{(3n) = E ^^(Y, - 7r,(/3J) = 0. 

i=l 

Under conditions (Al)-(A3) in Section 3.2, we can show that ii p^/n — )• 
as n — )• oo, then the independence estimating equations in (3.2) have a root 
/3„ such that 

(3.3) ||3n-/3.oll = Op(vW^), 

where || • || denotes the EucHdean norm of a vector. A detailed derivation of 

(3.3) is given in the Appendix. 

Example 2 (Estimated working correlation matrix when p„ — )• oo). In 
Balan and Schiopu-Kratina (2005), it was suggested to use 

1 " _ _ _ _ 

= -E Ar'^'(/3n)(Y. - 7r,(/3J)(Y, - 7r,(/3J)^A-^/'(/3J, 

i=l 

where /3„ is a preliminary A/n/p^-consistent estimator of /3„0) such as the 
one discussed in Example 1. This provides a moment estimator of the un- 
structured working correlation matrix. Assuming conditions (A1)-(A3) of 
Section 3.2, we can prove that if p^/n — )• as n — )• oo, then 

(3.4) ||R-i-Roi||=Op(vW^), 

where Rq denotes the true common correlation matrix. Here, and through- 
out the paper, for a matrix B, ||B|| = [Tr(BB"^)]^/^ denotes its Frobenius 
norm. A detailed derivation of (3.4) is given in the supplementary article 
[Wang (2010)]. 

3.2. Existence and consistency. In Fan and Peng (2004), Lam and Fan 
(2008) and Huang, Horowitz and Ma (2008), the estimator is defined as 
the minimizer of a certain objective function. We use alternative techniques 
here to establish the existence and consistency of the GEE estimator, which 
involve the roots of estimating equations. The approach we adopt here is also 
different from that of Xie and Yang (2003) and Balan and Schiopu-Kratina 
(2005), both of which rely on properties of injective functions. 

We directly verify the following condition: Ve > 0, there exists a constant 
A > such that for all n sufficiently large, 

(3.5) P( sup (/3„-/3„o)^S„(/3J<o) >l-e. 
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Condition (3.5) is sufficient to ensure the existence of a sequence of roots 
/3„ of the equation S„,(/3„) = such that ||/3„ — PnoW — Op^sJ-pnln). This 
approach follows from Theorem 6.3.4 of Ortega and Rheinboldt (1970). In 
Portnoy (1984), this technique was applied to establish the existence and 
consistency of an M-estimator for i.i.d. data; in a different setting, it was 
used by Wang et al. (2010) to study a partial linear single-index model. This 
leads to a more straightforward and elegant proof of weak consistency. On 
the other hand, the method relying on injective functions [Xie and Yang 
(2003); Balan and Schiopu-Kratina (2005)] can also be used to prove strong 
consistency. 

To prove consistency and asymptotic normality, we need the following 
general regularity conditions: 

(Al) supjj llXijII = 0{^fp;,)\ 

(A2) the unknown parameter /3„ belongs to a compact subset B C , 
the true parameter value /9„q lies in the interior of B and there exist two 
positive constants, h\ and 62, such that < 61 < 7rjj(/3„o) < ^2 < 1, Vi,j; 

(A3) there exist two positive constants, 63 and 64, such that 

\ 1=1 

where Amin (resp. Amax) denotes the minimum (resp. maximum) eigenvalue 
of a matrix; 

(A4) the common true correlation matrix Rq has eigenvalues bounded 
away from zero and +00; the estimated working correlation matrix R sat- 
isfies ||R~^ — R II = Op{\Jpn/'n), where R is a constant positive definite 
matrix with eigenvalues bounded away from zero and +00; we do not require 
R to be the true correlation matrix Rq. 

Remark 1. Condition (Al) is a common assumption in the literature 
on M-estimators with diverging dimension. For example, it is the same as 
assumption (3.9) of Portnoy (1985) and it is implied by conditions (C.9) 
and (C.IO) of Welsh (1989). This condition holds almost surely under some 
weak moment conditions for Xij from spherically symmetric distributions 
[see, e.g., the discussions in He and Shao (2000)]. When m = 1 (i.e., each 
cluster has only one observation), condition (A3) is also popularly adopted 
in the literature on high-dimensional regression for independent data. It can 
be shown that condition (A3) is implied by the following slightly stronger 
condition: there exist two positive constants, ci < C2, such that VI < j < m, 

Cl < Amin ^ ^ij'^Jj < Amax "-"^ ^ XjjX^- < C2. 
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Finally, condition (A4) is a direct extension of a similar assumption in the 
"fixed case. Liang and Zeger (1986) assumes that the estimator of the 
working correlation matrix parameter r satisfies \/n{T — tq) = Op{l) for 
some Tq. Assumption (C2) of Chen and Jin (2006) is of similar nature, while 
Xie and Yang (2003) assumes the nuisance parameter r to be completely 
known. Note that Example 2 in Section 3.1 guarantees that (A4) is satisfied 
when a nonparametric moment estimator is used for the working correlation 
matrix, in which case R = Rq . 

We use notation similar to that in Xie and Yang (2003) and Balan and 
Schiopu-Kratina (2005). Consider the following estimating equation: 

n 

Sn(/3J = ^Xf A;/'(/3jR~'Ari/2(/3J(Yi - 7r,(/3J). 

i=l 

If we let M„(/3„) denote the covariance matrix of S„(/3„), then 

n 

M„(/3J = ^Xf A|/'(/3jR-'RoR"'Aj/'(/3jX,. 

1=1 

To prove the consistency, the essential idea is to approximate Sn(/3„) by 
Sn(/3„), whose moments are easier to evaluate. Lemma 3.1 below establishes 
the accuracy of this approximation, which also plays an important role in 
deriving the asymptotic normality in Section 3.3. 

Lemma 3.1. Assume conditions (Al)-(A4). If n~^p'^ = o{l), then 

l|Sn(/3no) - S„(/3„o)ll =Op{pn)- 

To facilitate the Taylor expansion of the estimating function S„(/9„), we 
also use D„,(/3„) = — ^^S„(/3„) to approximate the negative gradient func- 

tion D„(/3„) = — ^^Sri(/3„). Lemma 3.2 below provides a useful representa- 

tion of D.„(/3„), based on which, Lemma 3.3 establishes the approximation 
of gradient functions. 

Lemma 3.2. 

(3.6) D„(/3J = H,(/3J + E„(/3J + G„(/3J, 

where 

n 

Hn(/3J = 5]xf a1/'(/3JR-'aJ/'(/3JX„ 

i=l 
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^ n m 

GniPn) = - 2 EE(1 - 2vri,(/3J)Af (/3jX,,X^.ejR-'s,(/3J, 

i=i j=i 

where e,,(/3„) = A;^/\(3J{Y,,-7t,,{(3„)), s,(/3J = A-^/'(/3J(Y, -7r(/3„)) 
and ej denotes a unit vector of length m whose jth entry is 1 and all other 
entries of which are 0. 

Lemma 3.3. Assume conditions (Al)-(A4). Ifn^^p^ = o(l), thenM/S. > 
0, for b„ € RF'^ , we have 

sup sup |b^[D„(/3„) - D.„(/3„)]b„| = Op(^/n^j;;'). 

Remark 2. The matrix D„(/3„) — D„(/3„) is symmetric. The above 
lemma immediately implies that 

sup |Amm[D„(/3„) - D.„(/3„)]| = Op{,/np;^), 

II/3„-/3„oII<^Vp^ 

sup |Amax[Dn(/3„) - D.„(/3„)]| = Op{^/rvp^). 

||/3„-/3„ol|<A^/^ 

Furthermore, we can use the leading term H„(/3,„) in (3.6) to approximate 
the negative gradient function Dn(/3„). This result is given by Lemma 3.4 
below. Lemma 3.5 further establishes an equicontinuity result for H„(/3„). 

Lemma 3.4. Assume conditions (Al)-(A4). Ifn'^p^ = o(l), then\fA > 
0, for b„ € RP", we have 

sup sup |b^[D„(/3„) - H„(/3„)]b„| = Op(\/np„). 

||/3„-/3„„||<A^/^ l|bn||=l 

Lemma 3.5. Assume conditions (Al)-(A4). Ifn~^p1 = o(l), then\/A > 
0, for b„ € RP", we have 

sup sup |b^[H„(/3„) - H„(/3„o)]b„| = Op(\/np„). 

||/3„-/3„oII<AVp^ II''"II=1 

The proofs of Lemmas 3.1-3.4 are given in the Appendix; the proof of 
Lemma 3.5 is given in the supplementary article [Wang (2010)]. The fol- 
lowing theorem ensures the existence and consistency of the GEE estimator 
when p„ — )• oo. 
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Theorem 3.6 (Existence and consistency). Assume conditions (Al)- 
(A4) and that n~^p^ = o(l). Then, ^n{Pn) — ^^•^ ^ ''"'^^^ f^n such that 

ll3n-/3noll =Op(\/pn/n). 

Proof. We will prove that (3.5) holds. This requires us to evaluate the 
sign of (/3„ - /3.„o)^S„(/3.„) on {/3„ : - /3„oll = /^^/pjn]. Note that 

— Inl + In2, 

where /3* lies between /3„ and /3„0) that is, /3* = t/3„ + (1 — i)/3nO some 
< t < 1. Next, we write 

Inl = iPn ~ Pno)'^^n{Pno) + iPn ~ Pno)'^ i^niPno) ~ ^n{Pno)] 
— Inll + Inl2- 

We have < ||/3„-/3„oll • l|Sn(/3no)ll = Ay^p„/n||S„(/3„o)ll bytheCauchy- 
Schwarz inequality. Furthermore, 

E[\\S„iP^,)f] 

= i<;|f;£f(/3„o)R"'A,P(/3„o)X,XfAy'(/3„o)R^'e,(/3„o)| 

n 

- Vax(XtXf )Aniax(Ai(/3„o))Amax(R )^[£f (/3no)^i(/^no)] 

(n \ n m 

j;X,Xf =Cj;j;X^.X,, = 0(np„). 
i=l / i=l j=l 

Here, and throughout the paper, we use C to denote a generic positive 
constant which may vary from line to line. Thus, ||S„(/3„o)ll = Op{^npn). 
This implies that = AOp{pn)- For Ini2, we have 

|/nl2| < ||/3„ - /3„oll • l|S„(/3„o) - S„(/3„o)ll = A^/pn/nOp{pn) = Aop(p„), 
by Lemma 3.1. Hence, = AOp(pn). In what follows, we evaluate In2'- 

In2 = -{Pn- /5no)^D„(/3*)(/3„ - /3„o) 

-(/3n - /3„,o)^[Dn(/3:) - D„(/3:)](/9n - Pno) 
— In21 + In22- 
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First, note that 

|/n22| < max(|A„,ax(D„(/3:) - D4/3;))|, |A„,in(D,(/3;) - D4/3;))|) 
by Lemma 3.3. On the other hand, 

In21 = -{Pn- /3no)^H„(/3„Q)(/3„ - /3„o) 

-iPn - /3„o)^[Hn(/3:) - H„(/3„o)](/3„ - /3„o) 



A ra : Tb . jc 



From Lemma 3.5, we have /^2i — ^'^Op{pn)', from Lemma 3.4, we have /^2i 
A^Op(p„). Finahy, we evaluate /^2i- We have 



TO 
-In' 



n21 



-(/3„-A 



,j=l 



(/3n-/3no) 



< -Amin(R )minAmm(Ai(/3„o))Amin I ^ xf j ||/3„ - /3„ol 



by (A3). Thus, (/3„ - /3„o)^Sn(/3n) on ||/3„ - /3„oll = Ay%Jn} is 

asymptotically dominated in probability by + Z^2i5 which is negative 
for A large enough. □ 



3.3. Asymptotic normality of the GEE estimator. The asymptotic dis- 
tribution of the GEE estimator /3„ is closely related to that of the ideal 
estimating function S„(/3„o)- When appropriately normalized, S„(/3„o) has 
an asymptotic normal distribution, as shown by the following lemma. 

Lemma 3.7. Assume conditions (Al)-(A4). Ifn^^p^ = o(l), then\/oLn. G 
i?^"- such that \\oLn\\ = 1, we have 

Q^M„ ^^^(/3„g)S„(/3„o) — ^ -^(0, 1) in distribution. 

-1 /Q 

To prove Lemma 3.7, we write ct^M^ (/3„Q)Sn(/3„o) a sum of inde- 
pendent random variables and then check the Lindb erg-Feller condition for 
the central limit theorem. The detailed proof is given in the Appendix. The 
following theorem ensures the asymptotic normality of the GEE estimator 
when n~^p^ = o(l). 
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Theorem 3.8 (Asymptotic normality). Assume conditions (Al)-(A4). 
If n~^p'^ = 0(1), then 'icxn G R^" such that \\oLn\\ = 1; we have 



a^M//'(/3„o)Hn(/3no)(3n - /3no) ^ ^^(0, 1) 



in distribution. 

Proof. We have 
a^M;'/'(/3„o)S„(/3„o) 

= a^M;'/'(/3„o)S„,(/3„o) + a^M;'/'(/3„o)[S„(/3„o) " S„,(/3„o)] 
= a^M;'/'(/3„o)D.(/3;)(3„ - /3„o) 

+ a^M;'/'(^„o)[Sn(/3„o) - S„(/3„o)] 
= a^M;'/'(/3„o)H„(/3„o)(3„ - /3„o) 

+ a^M;'/'(/3„o)[D„(/3:) - H„(/3„o)](3„ - /3„o) 

+ a^M;'/'(/3„o)[Sn(/3„o) " S„,(/3„o)], 
where, to obtain the second equality, we note that Sn(/3„) = and thus, by 
a Taylor expansion, Sn(/3„o) — ^n{Pn){Pn ~ Pno) some /3* between /3„ 
and /3„o- By Lemma 3.7, q^M„"'^^^(/3,„q)S„(/3„q) — )■ A^(0,1). Therefore, to 
prove the theorem, it is sufficient to verify that VA > 0, 

sup |a^M;'/'(/3„o)[D„(/3J-H„(/3„o)](3„-/3„o)l 

\\l3„-/3„o\\<Ay/p„/n 

(3.7) 
and 

(3.8) |«^M;;'/'(/3„o)[S„(/3„o)-Sn(/3„o)]l=Op(l). 
We prove (3.8) first. Note that 

[a^M;'/'(/3„o)[S„(/3„o) - S„(/3„o)]]' 

= a^M;'/'(/3„o)[S„(/3„o) - S„(/3„o)][S„(/3„,o) " Sn{Pno)f^n'^\f3no)c^n 



< A„ax(M„ (/3„o))A„.ax([S„(/3„o) - S„(/3„o)][S„(/3„o) - S„(/3„o: 
< 



|Sn(/3no) ~ S„(/3„o)|P 



A„,in(M„(/3„o)) 

IIS, 

< 
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= Op{pl/n)=Op{l), 
by Lemma 3.1 and the fact that 

(3.9) A^in(M„(/3„o)) > C^A^in ^fjxfx,^ . 

A justification of (3.9) is given in the proof of Lemma 3.7 in the Appendix. 
Thus, (3.8) holds. Next, we prove (3.7). We have 

sup |a^M;'/'(/3„o)[D,(/3„) -H„(/3„o)](3n-/3no)l 

ll/3„-/3„(,||<A^p„/n 

< sup |a^M,;'/'(/3„o)[D„(^J-D„(/3J](3„-/3„o)| 

||/3„-/3„oII<AVp^ 

+ sup |a^M;'/'(/3„o)[Dn(/3J - H„(/3J](3„ - /3„o)l 

ll/3„-/3„oll<AVpn/n 

+ sup |a^M;'/'(/3„,o)[H„(/3J -H,(^„o)](3n -^no)l 

ll/3n"/3„oll<AVP^ 

By the Cauchy-Schwarz inequality and Remark 2, we have 
Ini< sup [a^M;'/'(/3„o)(Dn(/3n)-D„(/3J)' 



II/3„-/3„oII<aVp^ 



xM„'/'(/3„o)a„]^/2||3^_^^^| 



< sup max(|Amin(Dn(/3n) - Dn(/3„))|, 

|An,ax(D„(/3J-D„(/3J)|) 

X A,;[f (M„(/3„o))Op(py2n-V2) 

= 0,(V^P„)0(n-l/2)Op(n-l/2pl/2) = 0^(^-1/2^3/2) ^ ^^(1)^ 

Hence, I„i = Op(l). By the same argument and Lemmas 3.4 and 3.5, we also 
have In2 = Op(l) and 1^3 = Op(l). This proves (3.7). □ 

Remark 3. Note that the condition n~^p^ = o(l) is the same as that 
of Huber (1973) for an M-estimator with independent data and diverging 
number of parameters. It is weaker than the condition n~^p^ = o(l) in Fan 
and Peng (2004) and Lam and Fan (2008) for asymptotic normality. 
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Remark 4. Combining the result of Theorem 3.8 with the Cramer- 
Wold device, it is easy to see that for any I x pn matrix B„ with / fixed and 
such that B„B^ — t- F, a positive definite matrix, we have 

Bn^-'/\f3^o){X - Pno) ^ NKO, F), 

where 

Now, take B„, = (L„I]„L^)~^/^L„Sy^, where L„ is an / x p„ matrix such 
that L„5]„L^ is invertible. Then, B„B^ = 1/ and we have the following 
corollary which gives the asymptotic distribution of L.„(/3„ — /3„o)- 

Corollary 3.9. Under the same conditions as in Theorem 3.8, ifn^^p'j^ = 
o(l), then 

(L„S„L^)-i/2l„(3„ - /3„o) ^ NKO, 10 

in distribution. 

3.4. Sandwich covariance formula and large-sample Wald test. Theorem 
3.8 and Corollary 3.9 suggest that the covariance matrix of /3„ is approxi- 
mately To estimate S^, Liang and Zeger (1986) proposed, in the "fixed 
p" setup, the following well-known sandwich covariance matrix estimator: 

s„ = h-h3„)m„(3jh-1(3j, 

where H„(/3„) is defined similarly as H„,(/3„), but with R replaced by R; 
M„(/3„) is defined similarly as M„(/3„), except that R is replaced by R 
and the unknown true correlation matrix Rq is replaced by 
with defined in Lemma 3.2. Based on Corollary 3.9 and the sandwich 

covariance matrix estimator, an asymptotic (1 — a)% confidence interval 
(0 < a < 1) for Pj is 

(3.10) 3j±V2uJ^«"j' 

where Za^i2 denotes the upper ^ quantile of the standard normal distribution 
and Uj is the unit vector of length pn with the jth element equal to 1 and 
all the other elements equal to 0. 

The sandwich covariance formula plays an important role in GEE method- 
ology. In the "fixed p" setup, it is known that the sandwich covariance ma- 
trix estimator provides a consistent estimator for the variance of the GEE 
estimator, even when the working correlation matrix is misspecified. The 
following theorem shows that this appealing property is still valid when p„ 
converges to oo at an appropriate rate. 

The proofs of Theorem 3.10 and Corollary 3.11 below are given in the 
Appendix. 
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Theorem 3.10. Assume conditions (Al)-(A4) and that n ^p^ = o(l). 
Then, 

where Cn is any I x g„ matrix such that I is fixed and C„C^ = G with G 
being an I x I positive definite matrix. 



Remark 5. It is worth pointing out a subtle issue that is sometimes 
overlooked in the existing literature on high-dimensional analysis of inde- 
pendent data. In order to justify the validity of the asymptotic confidence 
interval or large-sample test for estimable contrast, it is necessary to show 
that the convergence rate in Theorem 3.10 is Op(n~^). Note that the es- 
timable contrast is asymptotically normal with convergence rate Op(n^/^); 
see, for example. Corollary 2.1 in He and Shao (2000) for the case of an 
M-estimator based on independent data. In the literature, sometimes only 
the Op(l) rate is provided, which is not adequate, but can be fixed. 

Next, we consider the large-sample Wald test for testing the following 
linear hypothesis: 

Ho ■■ Ln/3„o = vs. Hi: UiPnO / 0' 

where L„ is an / x p.„ matrix with / fixed and L„L^ = 1;. The Wald test 
statistic is defined as 

Wn = (L Jj'^(L,S„L^)-1(L„3j. 

The corollary below shows that the Wald test remains valid, even when the 
number of covariates diverges with the sample size. 



Corollary 3.11. Assume conditions (Al)-(A4). Ifn~^p^ = o(l), then 
Wn — ?• distribution under Hq, where xf denotes the distribution with 

I degrees of freedom. 



Remark 6 . For testing a high-dimensional hypothesis Hq : /3,-j = /3* q 
versus Hi : /3„ 7^ /3*o, it can be shown that 



(3.11) 



iV(0,l) 



in distribution under Hq, under some regularity conditions. A proof of this 
result is given in the supplementary article [Wang (2010)]. 
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Table 1 

The simulated average mean squared error (xlO) for estimating 
/3„o using four different working correlation structures 



Working correlation structure 



n 


Pn 


IN 


UN 


CS 


AR-1 


500 


19 


0.265 


0.156 


0.154 


0.179 


1000 


24 


0.141 


0.103 


0.100 


0.111 


2000 


31 


0.090 


0.074 


0.071 


0.075 


3000 


36 


0.070 


0.065 


0.063 


0.065 



4. Numerical studies. We consider the following model for the marginal 
expectation of l^j, i = 1, . . . ,n, given Xjj, 

(4.1) logit(7r,,) = X,^^/3„o, i = l,2,3, 

where /3„o is a p„-dimensional vector of parameters with p„, = [2.5n^/'^J , 
with [q\ denoting the the largest integer not greater than q. In this exam- 
ple, /3^o = (0-4 • -0-3 • 1^, 0.2 • 1^, -0.1 • ij^^.gj, where U denotes a k- 
dimensional vector of I's and k = \jpn/^\ ■ In addition, Xij = {xiji, . . . , xijp^y 
has a multivariate normal distribution with mean zero, marginal variance 
0.2 and an AR-1 correlation matrix with autocorrelation coefficient 0.5. The 
binary response vector for each cluster has the above marginal mean and an 
exchangeable (also called compound symmetry or CS) correlation structure 
with correlation coefficient 0.5. Such correlated binary data are generated 
using Bahadur's representation [see, e.g., Fitzmaurice (1995)]. 

Since, for different sample sizes, the parameter dimension is different, we 
measure the accuracy of estimation by the simulated average mean square 
error, which is obtained by averaging ||/3„ — /S^glP/Pn over 500 simulated 
samples. Table 1 reports simulation results using four different working 
correlation structures: independence working correlation matrix (IN), un- 
structured working correlation matrix (UN), compound symmetry working 
correlation matrix (CS) and the first order autocorrelation working correla- 
tion matrix (AR-1), for sample sizes re = 500, 1000, 2000 and 3000. Table 
1 demonstrates that when the covariate dimension grows at an appropriate 
rate with the sample size, the accuracy of GEE estimator is satisfactory. 
We also observe that when the true correlation matrix (CS in this case) is 
adopted, the estimator is more efficient. 

We next examine the accuracy of the sandwich variance formula. The 
standard deviations of the estimated coefficients over 500 simulations are 
averaged and regarded as the true standard error (SD). Table 2 compares SD 
with the standard error obtained from the sandwich variance formula (SD2) 
when the unstructured working correlation matrix is used for estimating 
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Pk ) P2k 1 f^sk and /3p^ . We observe that the sandwich variance formula works 
remarkably well. Similar phenomena are also observed for estimating other 
regression coefficients and with other working correlation structures, but, 
for reasons of brevity, these are not reported. 

Finally, we investigate hypothesis testing based on the large-sample Wald 
test. We consider model (4.1) with n = 1000, pn = 24 and P^o = (O-^ • 
ll', -0.3 • if', 0.2 •ll', -0.1- 11^,0, 0,0,0). The left panel of Figure 1 depicts 
the density of the Wald test under the null hypothesis Hq : /32i = /922 = (^23 = 
/324 = and compares it with the density curve of the xl distribution. It 
demonstrates that the approximation given in Corollary 3.11 is accurate. 
The right panel of Figure 1 gives the normal Q-Q plot for the Wald test 
statistic under the null hypothesis /3„ = /9„o and it shows that the null dis- 
tribution can be approximated well by a normal distribution for testing a 
higher-dimensional alternative, as discussed in Remark 6. 

5. Discussions. 

5.1. Extension to general GEE. Although the focus of the paper is on 
clustered binary data, the approaches and techniques can be extended to 
general GEE. For general GEE, the decomposition of D„(/3„) given in 
Lemma 3.2 has a more complex expression, and the potential unboundedness 
of Yij makes the derivation of various probability bounds and asymptotic 
equivalence more delicate. Below, we give a brief discussion of the large-p 
asymptotics for general GEE. 

Assume that the first two marginal moments ofl^j are /iij(/3„) :=E^^(lij) = 
IJ,{9ij) and afj{PJ := Var^^(yij) = /*(%), where % = X^./3„. These mo- 
ment assumptions would follow when the marginal response variable has 
a canonical exponential family distribution with scaling parameter 1. Let 
Ai(/3„) = diag(o-2^(/3„), . . . , o-2^(/3„,)) and /ii(/3„) = (/iii(/3„,), ■ • . , //im(/3„))^- 



Table 2 

Standard deviation (SD) and estimated standard deviation (SD2) using 
the sandwich variance formula 































ik 


/33fc 






n 


Pn 


SD 


SD2 


SD 


SD2 


SD 


SD2 


SD 


SD2 


500 


19 


0.126 


0.111 


0.114 


0.110 


0.117 


0.111 


0.089 


0.098 


1000 


24 


0.082 


0.083 


0.079 


0.083 


0.085 


0.083 


0.072 


0.074 


2000 


31 


0.073 


0.060 


0.063 


0.060 


0.065 


0.060 


0.051 


0.053 


3000 


36 


0.060 


0.051 


0.049 


0.051 


0.052 


0.051 


0.051 


0.045 
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chi'^2 approximation Normal Q-Q Plot 




5 10 15 20 -2 -1 1 2 

Theoretical Quantiles 



Fig. 1. The left panel gives the estimated null density of the large-sample Wald test 
(dashed curve) and the density of the chi-square distribution with four degrees of freedom 
(solid curve) for testing Ho :/32i ~ P22 ~ /323 = P24, = 0. The right panel gives the normal 
Q-Q plot of the Wald test statistic under the null hypothesis /3„ = /3„g . 



The GEE estimator f3^ is the solution of 

n 

(5.1) ^Xf Al/'(/3jR-iAri/2(/3J(Y, - /.,(/3J) = 0. 

i=l 

In addition to assumptions (A1)-(A4) in Section 3.2, we adopt two addi- 
tional conditions: 

— 1/2 

(A5) there exists a finite constant Mi > such that i^dlA^ (/3„)(Yj — 
< Ml for aU i and some 6 > 0; 

(A6) if Bn = {(3n-\\(3n - PnoW < ^VpJ^}, then /2(X^./3„), l<i<n, 
1 < j < m, are uniformly bounded away from and 00 on /i(X^/3„) 

and i-i^^\j^Jj(3^), 1 < i < n, 1 < j < m, are uniformly bounded by a finite 
positive constant M2 on 

Remark 7. Condition (A5) is similar to the condition in Lemma 2 of Xie 
and Yang (2003) and condition (Ns) in Balan and Schiopu-Kratina (2005). 
Condition (A6) requires ijL\j\'X.fjP^^), k = 1,2,3, to be uniformly bounded 
when /3„ is in a local neighborhood around /3„o- This condition is generally 
satisfied for GEE. For example, when the marginal model follows a Poisson 
distribution, /x(t) = exp(t), thus filj\x.fjP^) = exp(X^/3„), k = 1,2,3, are 
uniformly bounded on 
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Theorem 5.1. Assume conditions (Al)-(A6) and that n ^p^ = o(l). 
The generalized estimating equation (5.1) then has a root /3„ such that — 
PnoW — Op{^/pn/n). Furthermore, ifn^^p^ = o(l), thenMcXn G -R^" such that 

«^M;;'/'(/3„o)H„(/3„o)(3n - /3no) ^ ^^(0, 1) 

in distribution, where M„"'^^^(/3„o) and H„(/3„o) /iawe the same expressions 
as in Section 3.2. 

A sketch of the proof of Theorem 5.1 is given in the supplementary article 
[Wang (2010)]. 

5.2. Related problems. In some scenarios, a "large n, diverging m" asymp- 
totic framework, where p is either fixed or also diverges at an appropriate 
rate, may be more appropriate. This corresponds to a real situation where 
the cluster size is itself large. For example, in a longitudinal study, doctors 
take measurements on the patients during each visit. Each patient forms a 
cluster. The cluster size is large if the number of visits is large. For a fixed p 
setting, this "large n, diverging m" asymptotic framework has been consid- 
ered by Xie and Yang (2003). A future topic of interest is to consider large 
m together with large p. 

Another interesting direction for future study is to consider a more flexible 
semiparametric specification for the generalized estimating equations in the 
large-p setting. In the classical "fixed p" setting, GEE with partially linear 
model specification has been investigated by Lin and Carroll (2001a, 2001b), 
Lin and Ying (2001), He, Zhu and Fung (2002), Fan and Li (2004), Chiou 
and Miiller (2005), Wang, Carroll and Lin (2005), Chen and Jin (2006), He, 
Fung and Zhu (2006) and Huang, Zhang and Zhou (2007), among others. 

APPENDIX 

We use C to denote a generic positive constant that can vary from line 
to line. 

Proof of (3.3). It suffices [Ortega and Rheinboldt (1970)] to show 
that Ve > 0, there exists a A > O^uch that for all n sufficiently large, 
^(suP||^^_^^^ll^^^/^(/3„ - /3„o)^5n(/3„) < 0) > 1 - e. We have 

rp~ J, d ~ 

dpi 

— Inl + In2, 
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where /3* lies between /3„o and We first consider Ini- For any /3„ such 
that ||/3„ - /3„oll = A./^, we have |/„i| < A./^||S„(;3„o)||. Note that 



E[\\Snif3^o)f] = E 



< E 



^(Y, - 7ri(/3„o))'^X,Xf (Y, - 7r,(/3„o)) 

,j=i 

n 

^A^a.(XiXf)||Y,-7r,(/3, 



,j=l 



<CTr( J^X^Xf =C^J]X^.X,, = 0(np„), 
/ i=i j=i 



by assumption (Al). Thus, < AOp{pn)- Next, 



In2 — -{Pri - Priof 



5^Xf^,(/3„o)X, 



i=l 



(/3„-A 



nO) 



nO) 



i=l 



{Pn ~ /3no) 



In2l + In22 ■ 



Note that /„2i < -A,nin(Ai(/3o))A^in(Er=iXfX0||/3„ -/3„of < -CPnA^, 
by (A3). Since ^Aij(/3„) = 7rij(/3„)(l-7rij(/3„))(l-27rij(/3„))Xij, we have 



|/n22|<(/3„-/3. 



^j;|Ai,(r)-A,,(/3„o)|Xi,X: 

.i=l j=l 



(/3n - 



< sup ||Xi,- II • 11/3* - /3o|| • P„ - /3of • A„,ax 5^Xf X, 

\i=i 



< 0(v^)Op(v/pJ^)(AV/"-)0(n) = A2op(p„), 

by (A1)-(A3). Thus, for sufficiently large A, (/3„ — P^q)'^ SnifS^) is domi- 
nated by /n2i, which is large and negative for all sufficiently large n. □ 

Proof of (3.4). The proof is given in the online supplement. □ 

Proof of Lemma 3.1. Let Q = {qji,j2}i<ji,j2<m denote the matrix 
R-i -R~\ Then, 



S„(^„o)-Sn(A 



nO) 



n m m 
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Y^^Y. 1h,i2^ih i^no)^ijJ {fino){Yij2 - ^^j2(.Pno))^^ 

«=i ii=ii2=i 



ji=ii2=i 

-1/2 



where Sij^ (^no) = (/3„o)(^ij2 - ^Tija (/3„o))- Note that 



E 



ji (/^nO ) ^« j2 (/^nO ) ^ijl 



1=1 



i=l 
n 



i=l 



1 /9 

Thus, II Ya=i ^iji iPno)^ij2iPno)^iji II = Op{,/np;;) VI < ji,i2 < m . Since, 
by (A4), gjij2 = O^i^sJ -pnln) VI < j'l, j2 < the proof is complete. □ 

Proof OF Lemma 3.2. The derivation can be found in Pan (2002). □ 

Proof of Lemma 3.3. Let H„(/3„), E„(/3„) and G„,(/3„) be defined 
the same as H„(/3„), E„(/3„) and G„(/3„), respectively, but with R replaced 
by R. By Lemma 3.2, it is sufficient to prove the following three results: 



(A.l) 



(A.2) 



(A.3) 
We have 



sup sup |b„[H„(/3„) - H„(/3„)]b„| 

||/3„-/3„oll<AVP^Ii'^"ll=l 

sup sup |b^[E„(/3„) -E„(/3„)]b„| 

||/3„-/3„oII<AVp^II''"I1=^ 

sup sup |b^[G„(/3„) - G„(/3„)]b„| 

||/3„-/3„oII<AVp^II'^"II=1 



|b^[H„(/3J-H„(/3J]b, 



^b^Xf a1/^(/3J[R-i - R-^]Al/'(/3jX,b 



1 R-1ia1/2, 
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< IIR-I - R ||A„,ax(Ai(/3„))A„,ax J^Xf X, ||b„| 



By assumptions (A2) and (A4), (A.l) is proved. Next, note that 
|b^[E„(/3J-E„(/3J]b„| 



X [R-^ - R VieJXib„ 



^ EE Af ^/3J[R-i - K%\ ■ |eJX,b„| 

i=i i=i 



<^Y.^^ (^nm~' - R II • iiAr(/3jii • iix,b„i 
i=i j=i 



Thus, 



sup sup |b„[E„(/3„) -E„(/3„)]b„| 

||/3„-/3„„||<AV?W^I|l'"ll=l 



<c\\k~^-k''\\-y^y: 



sup 



Ai/^^(/3„) sup ||Xib„ 



i=l j=l ll/3„-/3„oll<Ai/Pn/n 



llbn||=l 



= Op{yJpn/n)0{n) = Op{^/np^i), 
by assumption (A3). (A. 3) is proved similarly. □ 

Proof of Lemma 3.4. By (3.6), it is sufficient to verify that 
(A.4) sup sup |b^E„(/3„)b„| = Op(\/np„), 



||/3„-/3„oII<AVp^ 



l|bn||=l 



(A. 5) sup sup |b„ G„(/3„)b„| = Op(\/np„). 

First, note that we have the following decomposition of E„(/3„): 

En(/3J = 2 EE(1 - 2vr.,(/3„o))ei,(/3no)Xf A;/'(/3„o)R"'e,eJx, 

i=i j=i 

^ n m 

+ 2 E E(l - 2vr., (/3„o))%-(/3no)Xf [Aj/^/3J - Aj/^/3„o)] 
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— — 1 T 

X R e^-ej Xj 



^ n m 

+ 2 EE[(1 - 27r.,(/3J)Ar.i/2(/3J - (1 - 27r,,{P^,))A;^/\p^,) 



i=i j=i 



^ n m 

i=i j=i 

X Xf Aj/^/3jR~'e,-eJX, 

4 

Ein(/3,o) + ^Efc„(/3J. 



fe=2 



Thus, to prove (A. 4), it suffices to verify that supyb^^n^x |t)^Ei„(/3„o)bn| = 
Op(\/np„) and sup||^^_^^^^|j^^^/^ sup||b„||=i |b^Efc„(/3„)b„| = Op(^Pn). 

We first prove that sup||b„||=i |b^Ei„(/3„o)bn| = Op{^pn), by verifying 



that ||Ei„(/3„o)|| = Op (V^I^n), where ||Ei„(/3„o)ll = V trace(Ei„(/3„o)Ern(/3no)): 
£;[||Ei„(/3„o)lP] 

^ n m m 

I E E 5Z (1 - 2vr.,-,(/3„o))(l - 27r,,,(/3„o))i^[ei,i(/3no)eij2(/3no)] 



4 

J=l ji = l j2 = l 



xtrace[XfA/ (/3„o)R- e^^eJ^XiXf ej^e^-^ 



X R 'A,p(^„o)Xi 



n m m 

^^Y.Y.Y. |ej;XiXfe,,eJ^R-'Aj/2(/3,o) 

«=i ii=ii2=i 

X X,Xf Ay'(/3,o)R"'e,J 

n m. m 

^ E E ll^i^^ll • W^^^nW ■ ||eJ,R~'Al/2(/3„o)X.|| 

1=1 ji=l J2 = l 

X ||Xf A,^/'(/3„o)R"'e,J|. 

Note that lleJ^Xill = ||X,,J|, ||Xfe,-J| = ||Xi,-,||, ||eJ^R"'Aj/'(/3„o)Xi || < 
C(trace(XiXf))i/2 and ||Xf A|/^(/3„o)R~^ejJ| < C(trace(XiXf )) ^2. Thus, 
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n m m 



i?[||Ei„,(/3„o)f ] <CJ2J2Y1 W^^nW ■ ||X,,,||trace(X,Xf) 

1=1 ji = lj2 = l 

/ n \ 

< C • max llXjj trace I ^^XjXf j =0{np'f^), 



-.1=1 



by assumptions (Al) and (A3). This implies that supyb^n^]^ |t>n-^in(/3no)bnl 
Op{^ynpn). Next, we have 

|b^E2n(/3„,)b„| 

^ n m 



2 

i=i j=i 



— 1 'T 

X R BjBj Xjb„ 



< ^EE IbnXf [A,:/^(/3J - Ap(/3„o)]R-'e,| • |ejx,b. 



i=i j=i 

n m 

< ||X,b,||2A^ax(R"')max|Ajf (/3J - AfiP^,j 



•1-1 



Note that there exists some /3* between /3„ and /3„q such that 



A;/2(/3J - a|/'(/3„o) = ^a|/'(/3:)(1 - 2^,,(/3:))x5(/3, - /3, 



<C||X,,||.||/3„-/3„oll- 

Therefore, 

sup sup |b^E2„(/3„)b„| 

||/3„-/3„oll<AVp;:7^l|bnli=l 

/ n 

< Cma,x||Xij|| sup WPn - PnoW • Amax E 

xfx, 



= 0(vp;:)0(vW^)0(«) = oiV^pn). 

Similarly, we can show that sup||^^_^^^||^^^/^^sup||b„||=i |b^Efc,„(/3„)b„| = 
0{\/npn), k = 3,4:. This proves (A. 4). Similarly, we can prove (A. 5). □ 

Proof of Lemma 3.5. The proof is given in the online supplementary 
material. □ 
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Proof of Lemma 3.7. We write a^M„ (/3„o)Sn(/3„o) = Y^'i=i^ni, 
where = a^M;'/'(/3„o)Xf A,^/'(/3„o)R"'£i(/3„o)- Since M„(/3„o) = 

1/2 

Cov(Sn(/3„o))5 have Var(Q;^M„ (/3„o)Sn(/3„o)) = 1- To estabhsh the 
asymptotic normahty, it suffices to check the Lindberg condition, that is, 
Ve > 0, J27=i ^[^?iiH\^ni\ > s)] — ^ 0. By the Cauchy-Schwarz inequality, 

Zl, < ||a^M;'/'(/3„o)Xf A,^/^/3„o)R"'f • lk^(/3„o)f 

<A^ax(R^')A„,ax(A,(/3„o))(«nM;:'/'(/3„o)XfX,M;'/'(/3„o)"n) 

x|k.(/3„o)f 
<C7™|ki(/3„,o)f, 

where 7„,j = Q!^M„ ^^^(/3„Q)X?"XjM„/^^(/3„o)Q;„. Next, we will show that 
maxi<^„7„i as n-^ oo. Note that -fni < Xrna^(X.fXi)X~j^^{Mn{l3no))- 
Since M„(/3„o) is symmetric, to evaluate Amin(lV[„(/3„o)); Vb„ G i?*'", we 
have 



b^M„(/3„o)t)n > Aniiii(Ro)Amin(R ) ^ Amin(Ai (/3„o))t>^Xf Xjb^ 

j=l 



> CbH 5^ Xf Xi b„ > CA^in 5^ Xf X, 



\i=l / \i=l / 

Thus, inf||b„||=i |b^M„(/3„o)bn| > C'Amin(I]r=i Xf Xj) and this implies that 
Amm(M„(/3„o)) > C'-^min(I]r=iXfXi). Therefore, we have 



7- 



A^ax(XfX,) ^ Tr(XfX,) _ ET=i^j^^J 



ni 



<■ ■iiid^ \ 2 / ^ 



CAniin(X]r=lXfXi) C'Amin(I]"=i XfXj) C'-^minlZliLl Xf Xj) 

It follows that maxi<j<„7„j < 0{n^^pn) = o(l). We have 

^2 



e,(/3„o)r/ lk.(/3no)lr> 



C7„i 



i=l j=l 

Note that ||ei(/3„o)|P is uniformly bounded, by assumption (A2). Thus, 
for all e > and (5 > 0, there exists a positive integer N such that (1) 

/{||£,(/3„o)f > = for ah n> N- (2) E^Ii ^Tm < 5 for ah n suf- 

ficiently large. This ensures that 



i=l 

Therefore, the Lindberg condition is verified. □ 



£,(/3„o)r/<! \\e^{(3^oW > 



0. 
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Proof of Theorem 3.10. It is sufficient to show that for b„ £ RP" , 
(A.6) sup |b^(S„-5]„)b„|=Op(n-i). 

I|bn||=l 

We use the conclusion of Theorem 3.6 throughout the proof. Note that we 
can write I]„ - S„ = /„i + /„2 + -^ns, where 

/„i = H-i(3j[M„(3n) - M„(/3„o)]H-n3n), 

In2 = [H-H3n) - H;'(/3„o)]M„(/3„o)H-1(3J, 

/„3 = H;\/3„o)M„(/3„o)[H-H3n)-H;'(/3„o)]. 
Thus, (A.6) is imphed by sup||)3^||=;^ |b^/„jb„| = Op(l). We have 

sup |b^7:„ib.„| 

l|b„||=l 



< 



max(|An.ax(M„(/3„) - M„.(^„o))|, |Ai^in(M„(/3„) - M„(/3„o))|) 

^mm(Hn(/3„)) 

To evaluate the eigenvalues of M„(/3„) — M„(/3„o), we have 

|c^[M„(3„)-M„(/3„o)]cn| 

< |c^[M„(3„) -M„(/3„o)]cn| + |c^[M„(/3„o) - M„(/3„o)]cn| 
for c„ G BP" . Note that 

sup |c^[M„(3„)-M„(/3„o)]c„| 

|Cnll=l 



< sup 

llc„||=l 



J^c^Xf [A,p(3j - A,p(/3„o)]R-'e,(3jef (3J 

i=l 



X R-iA,y'(/3„)XiC„ 



+ sup 

l|Cnll = l 



J;c^XfAp(^„0)H."'£^(3n)^^(3n)R.-' 



x[A|/2(/3J-A,^'^(/3„o)]X,c 



1/2, 



+ sup 



c„ =1 



J^c^Xf Ap(/3„o)R-'[e.(3n)^f(3n) - £.(/3„o)£f(/3no)] 
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X R-iAj/'(/3„o)X.c„ 



A 



sup Jnl + sup J„2 + sup J„3. 

I 1 1 —1 II 1 1 ~1 \\^'>i' II 



Note that 

i=l 

1/2 ^ 

We have ||A/ (/3„)XjC„|| < ||XjC„|| and 



llc^Xf [A|/'(3J - A|/'(/3„o)]|| < ||X,c„||max|Ajf (3J - Ajf (/3„o)| 

< C||XjC„|| • ||Xjj|| • ||/3„ — /3„oll- 

Furthermore, 

\\K''sSn)f = (Y. - 7r,(3j)^Ari/2(3jR-2A-^/2(3j(Y, - 7r,(3j) 

< A^ax(R-')A„,ax(Arl(3j)||Y, - 7vSn)f < COp(l). 

Thus, 

sup J„i < Op(l)||3„ - /3„oll max||Xij||Amax yixfXi =Op(n). 

iiciNi vtr / 

Similarly, sup||c,^||=i J„2 = Op{n) and sup||c,^||=i J„3 = Op(n). Thus, 

sup |C^[M„(3„) - Mn{Pno)]Cn\ = Op{n). 
||c„||=l 

Similarly, sup||c„||=i |c^[M„(/3„o) - M„,(/3„o)]cn| = Op{n). Finally, note that 
Amin(H„(3„)) > Amin(R)min(7ry(3n)(l -7rij(3„)))Amin[ ^ xf Xj 1 



= Op{n). 

Thus, supiib^ii^]^ |b^/„ib„| = Op{n~^). We can also prove that sup[[b^||=;L l^nlni x 
bn| = Op{n~^), i = 2,3, by first noting that 

H-1(3J - H;'(/3„o) = [H-H3n) - H;'(3J] + [n-\X)-^n\Pno)] 

and then using the expressions 

H-H3n) -h:'(3„) = h;\3„)[h„(3„) -H„(3j]H-i(3n), 

H„\3n) -H„\/3„o) = H„\/3„o)[Hn(/3„o) - H„(3n)]H„^(3n)- 
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□ 

Proof of Corollary 3.11. It is sufficient to sliow tliat 
(A.7) [(L„S„L^)-i/2 - (L„S„L^)-V2]L^(3^ _ ^^^) ^ g 

in probability. Note tfiat tlie left-liand side can be written as 

[(L„£„L^)-i/2(L„I]„L^)V2 _ I,](L„I]„L^)-V2l^(3^ _ ^^^) 

and tlius (A.7) is implied by 

(L„,S„L^) "^(L„S„L^) — 1; = (L„S„L^) "^L„(S„ — ^„)L^ = Op(l). 

Let Uj denote the / x 1 unit vector with the ith element being 1 and all 
of the other elements being 0. Then, for all 1 < i,j < I, we have, by the 
Cauchy-Schwarz inequality, 

< |uf (L„E„L^)-2ui|VVJ[Ln(5]n " S„)L^]2u,f 

^ ||Ln(5]„ — S„)L^|| 



Amin(Ln5]nLn ) 

Now, for any /-dimensional vector such that ||b|| = 1, we have 

|b^L„£„L^b| > |b^L„S„L^b| - |b^L„(£„ - 5]„)L^b| 

> Amin(5]„) +Op(n"^) 
-^max(Hn(/3„o)) 

where the second inequality uses Theorem 3.10. By (3.9), Amin(M„(/3„o)) ^ 
ciAinin(X]r=i -^f-^i) ^'^^ some positive constant ci. Similarly, we can show 
that Amax(H„(/3„o)) < C2Xma.AY17=i^I^i) ^ome positive constant C2. 
Thus, A„,in(L„S„L^) > Op(n-i). This proves that IILn(Sn-s„)L^|| ^ ^^^^^^ 
by Theorem 3.10. □ 

Acknowledgments. The author would like to thank the Associate Ed- 
itor and two referees for their constructive and insightful comments that 
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SUPPLEMENTARY MATERIAL 

Supplement to "GEE analysis of clustered binary data with diverging 
number of covariates" (DOI: 10.1214/10-AOS846SUPP; .pdf). The proofs of 
(3.3), Lemma 3.5, (3.11) and Theorem 5.1 are provided in this supplementary 
article [Wang (2010)]. 
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